Contextual Gaussian Process Bandit Optimization
نویسندگان
چکیده
How should we design experiments to maximize performance of a complexsystem, taking into account uncontrollable environmental conditions? Howshould we select relevant documents (ads) to display, given information about theuser? These tasks can be formalized as contextual bandit problems, where at eachround, we receive context (about the experimental conditions, the query), andhave to choose an action (parameters, documents). The key challenge is to tradeoff exploration by gathering data for estimating the mean payoff function over thecontext-action space, and to exploit by choosing an action deemed optimal basedon the gathered data. We model the payoff function as a sample from a Gaussianprocess defined over the joint context-action space, and develop CGP-UCB, anintuitive upper-confidence style algorithm. We show that by mixing and matchingkernels for contexts and actions, CGP-UCB can handle a variety of practical ap-plications. We further provide generic tools for deriving regret bounds when usingsuch composite kernel functions. Lastly, we evaluate our algorithm on two casestudies, in the context of automated vaccine design and sensor management. Weshow that context-sensitive optimization outperforms no or naive use of context.
منابع مشابه
Efficient Ordered Combinatorial Semi-Bandits for Whole-Page Recommendation
Multi-Armed Bandit (MAB) framework has been successfully applied in many web applications. However, many complex real-world applications that involve multiple content recommendations cannot fit into the traditional MAB setting. To address this issue, we consider an ordered combinatorial semi-bandit problem where the learner recommends S actions from a base set of K actions, and displays the res...
متن کاملOn 2-armed Gaussian Bandits and Optimization
We explore the 2-armed bandit with Gaussian payoos as a theoretical model for optimization. We formulate the problem from a Bayesian perspective, and provide the optimal strategy for both 1 and 2 pulls. We present regions of parameter space where a greedy strategy is provably optimal. We also compare the greedy and optimal strategies to a genetic-algorithm-based strategy. In doing so we correct...
متن کاملNonparametric Contextual Bandit Optimization via Random Approximation
We examine the stochastic contextual bandit problem in a novel continuous-action setting where the policy lies in a reproducing kernel Hilbert space (RKHS). This provides a framework to handle continuous policy and action spaces in a tractable manner while retaining polynomial regret bounds, in contrast with much prior work in the continuous setting. We extend an optimization perspective that h...
متن کاملLearning and decisions in contextual multi-armed bandit tasks
Contextual Multi-Armed Bandit (CMAB) tasks are a novel framework to assess decision making in uncertain environments. In a CMAB task, participants are presented with multiple options (arms) which are characterized by a number of features (context) related to the reward associated with the arms. By choosing arms repeatedly and observing the reward, participants can learn about the relation betwe...
متن کاملGaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design
Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze GP...
متن کامل